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^^ I Higher order correlation measurements involve multiple event averages which 

must run over unequal events to avoid statistical bias. We derive correction 

■^ ■ formulas for small event samples, where the bias is largest, and utilize the 

^^ . results to achieve savings in CPU time consumption for the star integral. 

^ I Results from a simple model of correlations illustrate the utility and impor- 

I ' tance of these corrections. Single-event correlation measurements such as in 

g^: galaxy distributions and envisaged at RHIC must take great care to avoid 

T^ • this unnecessary pitfall. 
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I. INTRODUCTION 

In the hope of obtaining new insights into the old problem of soft interactions in high 
energy physics, there has been much interest in multiparticle correlations in the last few 
years, spurred by new theoretical perspectives and a large amount of multiparticle data 
in hadronic and nuclear collisions |l|J3. While various Monte Carlo codes and analytical 
models often yield very similar behavior in rapidity and p± distributions, they predict widely 
differing particle correlations. Experimentally measured correlations are therefore becoming 
an important and severe test of such theoretical models. 

Experience has shown, however, that correlation measurements require considerably more 
subtle and sophisticated understanding of statistics than single-particle quantities do, and 
there has been much improvisation in methodology and interpretation of data. A clean and 
consistent statistical basis for such methodology has become a matter of urgency. 

Recently, we have shown how, through the use of the correlation integral, the measure- 
ment of multiparticle correlations can be greatly improved, both in conventional variables 
such as rapidity and azimuthal angle and in terms of relative momenta used in pion 
interferometry Q. By deriving all quantities from first principles, our techniques, besides 
greatly improving the accuracy of correlation measurements, permit for the first time the 
direct measurement of cumulants. Moments, while easily measured, contain lower-order 
correlations. Cumulants, testing the actual correlations, are to be preferred, but they are 
hard to implement for at least two reasons: they contain a hidden statistical bias and are 
expensive in terms of CPU time. 

The mentioned bias is present in all correlation measurements; it is large for small data 
samples and strong correlations while becoming negligible for large samples and weak corre- 
lations. Our analysis provides the framework for understanding and dealing with this bias 
in any present or future data set. 

Secondly, correlation integral algorithms, while much superior to conventional methods, 
run at least as the square of the event multiplicity and the sample size A^ev In understanding 
this bias, we point the way to huge reductions in computer time also. Defining for inner 
event averages a "reduced sample average" containing only A events, and correcting for the 
resulting bias, we obtain, compared to full event mixing, savings of a factor N^^/A for the 
star integral. For a typical case with A'^ev = 10^ events and A = 100, the savings amount to 
a factor 1000 over full event mixing. 

Besides the bias under discussion, there clearly are other biases, both statistical and 
systematic, which greatly influence multiparticle correlations. Typical unwanted but often 
important effects include the "empty bin effect" P] and contamination by trivial sources of 
particle correlations such as Dalitz decays and gamma conversion |^ or the misidentification 
of pieces of a single track as two (highly correlated) particles 0. All these have been shown 
to be capable of drowning other correlations in the background. Eliminating such biases is 
therefore a sine qua non of multiparticle correlations. We take here a simple model of such 
correlations, the split track model [Q , to illustrate both the use of the reduced event average 
with bias correction and the effect such contamination may have on correlation data. 

In Section ||, we first explain the use and significance of unbiased estimators and find 
a general form for unbiased estimators of products of densities. We develop the general 
formalism in Section |Tl| and apply these in Section |^ to the star integrals. An example 



of behavior of the star integral as apphed to the spht track model is given in Section 0, 
followed by an outline of steps needed to measure unbiased correlations in truly small samples 
and a brief discussion of corrections for other correlation methods. We conclude with some 
comments on small samples and single-event measurements. First results regarding unbiased 
estimators can be found in Ref. [Q. More recently, this formalism has been applied to the 
problem of normalization in a fixed-bin context [|l^ . 

II. UNBIASED ESTIMATORS FOR PRODUCTS OF DISTRIBUTIONS 

We briefly remind the reader of some basics of statistical theory. Suppose we have a 
random variable U which for a given trial (or "event" in the parlance of high energy physics) 
takes on a value U. For a finite number of events A'^ev, the set of values of U make up a 
sample, for which the sample average of U can be found, {U)s = Se f^e/^ev By carrying 
out an infinite number of trials (the population) , one can theoretically determine the "true" 
behavior U of the random variable. The expectation value E[U] of a quantity U is the value 
found over an infinite number of trials, 

U = E[U]= lim J2Ue/N,,. (1) 

e=l 

An experimental sample invariably consists of a finite number of events, so that E[U] can- 
not be found directly. A large part of statistics occupies itself with the question how the 
information contained in a limited sample can be extrapolated to estimate its true behavior 
over the whole population. Rather than taking the limit A'ev — > C)0, one imagines that there 
are J\f samples, each with N^^ events and a particular value of {U)s for each. These sample 
averages themselves form a distribution, the sampling distribution. For infinite A'ev, the 
sampling distribution of course narrows to a delta function centered on U, but for finite N^.^ 
the sampling distribution has a nonzero width independent of the number of samples A/". 

There is no way to ascertain where the (f/)s obtained for one experimental sample will 
fall in this distribution, i.e. one can never claim with certainty that {U)s = U . All that can 
be achieved is to make sure that, even for finite iVev, the sampling average of the sampling 
distribution 

equals the true value U . Surprisingly, this is not generally true: for finite A'ev, {U} is not 
necessarily equal to U . When it is not, U is termed a biased estimator oi U, and one attempts 
to find a corresponding unbiased estimator e{U) which does fulfil the condition 



{e(f/)} 



U for all finite A^ev (3) 



Note: Here and throughout this paper, we use the shortened notation e(f7) to denote the 
unbiased estimator for the true value U , i.e. the U inside the brackets is not the argument 
of e but the desired result. The set of Ue of the experimental sample make up the arguments 
of e, which in full notation should be written as efjiUi, U2, . . . , Un^^)- 



For the case of inultiparticle physics, the basic random variables U correspond to the 
one-particle inclusive density of event e, 



N 



pl{x)=J2S{x,-X^,), (4) 

where Xi are the set of measured coordinates of the N particles of the event, and the 
corresponding g-th order densities for the event e, 

Plix,,...,x,)= Y. Six, -XI)... Six, -XI). (5) 

These yield the sample average Pg = {Pq)s = Z^ePg/^ev, identified with the usual experi- 
mental inclusive density 

PgiXi,X2,...,Xg) = —- -— (6) 

aj dxi dx2 ■ ■ ■ dXq 

which is normalized to the factorial moment of the event multiplicity N, J pg = {N^'^^) = 
{NiN-l)...iN-q + l)). 

It has long been known that the inclusive density is an unbiased estimator for the true 
value, {pg} = Pg, and so little attention has been paid to the theory of estimators in high 
energy physics. Unlike a single inclusive density, however, a product of two or more densities 
is a biased estimator. This we illustrate for the simple example of the product of two single- 
particle densities pi(cci)pi(a;2) before considering the general case. The sampling average of 
PiPi is 



{piPi} ={n-'T.pTp?] 

I 61,62 J 



N-' T.pTp?\ + \K^'T.PiPi\, (7) 

617^62 J I 61 J 

i.e. there are A'^ev out of the total N^^ terms in which the two pi's refer to the same event 
and thus effectively introduce a correlation. Because the densities of different events are 
independent, the sampling average of their product factorizes, yielding the true inclusive 
densities,[] 

{pTp?} = {pT}{p?} = PiPi ifeiT^e^ (8) 

so that 

{piPi} = (1 - N-')p,p, + N-^' {pl^pl^} , (9) 



^ The true value pg can be written as the sampUng average of either the sample-averaged density 
or of the single-event density, pg = {pg} = {pq}, because Eq. (|3|) is valid for single-event "samples" 
A'pv = 1 also. 



meaning that {pipi} is not equal to pipi and thus pipi is a biased estimator for the latter. 
The culprit is clearly the equal-event part in Eq. (^. For just one available sample, the 
needed unbiased estimator for the true value pipi is the unequal-event sum 

e(piPi) = i E Pl'P? ■ (10) 

The above simple example generalizes to the following result: Given a product of K inclusive 
densities of order qi, q2, . ■ ■ ,qK, respectively, the unbiased estimator for the product of true 
values is given by 

e{pg^Pg, ■ ■ ■ Pck) = —[K] E PIIpII ■■■PTk'^ (H) 

iVcv eii^e2^...^eK 

for example, the unbiased estimator for P2P1P1 will be given by Y^eiytez^e-j, pV^pTpT /^3- This 
equation is the most important point of our paper. In the following sections, we explore the 
consequences for various correlation measurements of taking only unequal events in products 
of densities. 

Products such as in Eq. ( pAj ) can be written in terms of event mixing, a procedure used 
heuristically before to normalize correlation measurements. From here on, we distinguish 
three different kinds of event mixing: Denoting the first event average by the index a and 
subsequent averages by 6, c, full event mixing is given by running all indices over the full 
sample with N^^ terms, 

— y — - — y — - — E ■ • • , (12) 

the reduced event average runs the inner event averages over A events only, 

Ncv 1 a— 1 1 a— 1 

E- E - 

^ev a=l ^ b=a-A ^ 



' Et E T^ E ■••. (13) 

c=a—A 
c^b 



while fake event mixing selects randomly a track from each of N different events (where 
N itself must follow a Poisson distribution) and does the standard analysis on a sample of 



such fake events [|rT|. While full event mixing is exact, it is feasible only for small samples, 
so that in practice the reduced average or fake event procedures are chosen. The latter is 
easy to understand and implement for the normalization pi, but hard to implement for the 
cumulant expansions introduced below. We shall concentrate therefore on using the reduced 
event average. 

III. CORRECTION TERMS FOR K-FOLB PRODUCTS 

Before going into the details of unbiased estimators for the various correlation measure- 
ments in current use, we establish the general framework for these corrections which will be 
applicable for all occurrences of products of random variables. To simplify notation, we write 



for the single-event inclusive densities Pq the variables ?7, V", W ^ . . . and (f/)^ = A^ev^ I^e ^e 
the sample averages; the desired true values are [7, V etc. As in Section ^ the desired un- 
biased estimator for a given product is obtained when the single factors come from different 
events, as then the sampling average factorizes, 

and, to make full use of all events in the sample, the sums over all (unequal) events are 
introduced. Products of experimentally measured inclusive densities, on the other hand, 
have unrestricted sums, so that it is necessary to expand the unequal-event sums in terms 
of unrestricted ones. Writing the Kronecker delta 5e-^e2 as 5i2 for short, 812^ = 6ej^e2^e2e3 and 
so on, we have for the double sum 

E =E- E =E(i-M- (15) 

ei7^e2 £1^2 ei=e2 eie2 

i.e. the factor (1 — 612) forces the unrestricted sum to the unequal-event sum. In third order, 
the corresponding combinatorics are 



ei = 62 7^ 63 

ei = 63 7^ 62 

62 = 63 7^ 61 

ei = 62 = 63 
ei 7^ 62 7^ 63 



(5i2(l - 623) 
(5i3(l - ^23) 
^23(1 - 5i3) 

^123 

1- 5i2- 613 - 623 + 2^123 



where the last line is obtained from the previous ones by requiring that all cases have to 
add up to 1, so that 

E = E [1 - ^12 - ^13 - ^23 + 26,23], (16) 

617^627^63 616263 

while in fourth order, 

E = E [l-E^12 + 2E^123 + E'^12^34- 6^1234]; (17) 

617^627^637^64 61626364 (6) (4) (3) 

the brackets under the sums indicating the number of permutations to be taken. 

These expansions are utilized as follows. Let A be the number of events over which an 
average is performed, (U) = Y.eU^/A (this differs from the full sample average (f/)^ when 
doing reduced event mixing). The unbiased estimator for IJV is expanded in second order 
to 

<UV) = -Ij E U'-y-' = ^,(U){V) - ^^{UV) , (18) 

617^62 

and with v4VA[2] = 1 + 1/{A-1), 

eiUV) = (U) {V) - j^^2iU, V) , (19) 



where 

n,(U,V)^{UV)-{U){V), (20) 

i.e. we get a correction consisting of a second-order correlation, suppressed by a factor (A—l). 
Using Eq. (|16D and expanding A^/A^l = 1 + 3/(y4-l) +4/(y4-l)[^] etc., we get for the third 
order unbiased estimator 

e(UVW) = 4^1 y f/'^V^'W'^ 
API J-", 

= (U) (V) {W) - j^ Y. ^^(U, V) {W) + (^_\)[2i ^3(^, V, W) , (21) 

where 

Ks{U, V, W) = (UVW) - {(JV) {W) - {WU) {V) - {VW) (U) + 2{U) {V) {W) (22) 

is a third-order correlation, suppressed in Eq. (|21| ) by a factor l/(yl — 1)'^^ In fourth order, 
with 

K4{U,V,W,X) = {UVWX) -J2{UVW){X) 

(4) 

- Y.{UV){WX) + 2Y^{UV){W){X)-6{U){V){W){X), (23) 

(3) (6) 



we have 



e{UVWX) = {U){V){W){X) 

-^j:n,{U,V){W){X) 
^ ~ Me) 

+ M \v2] I 2 E «^3(f>, V^, #) (X) + ^ K,iU, V)K2iW, X) 

^^~^^ V (4) (3) 

- TX^TW (^'^^(t/, V^, #, X) + 3 ^ /.^(t/, V)«:2(H^, X) | . (24) 



IV. BIAS CORRECTIONS FOR THE STAR INTEGRAL 

As stressed previously, the quantity underlying all correlation measurements is the in- 
clusive density pq. Bose- Einstein measurements [^, fixed-bin factorial moments [Q] and cu- 
mulants [Q, as well as correlation integrals |Q all sample Pq in the form of (unnormalized) 
factorial moments 

^q(r2) = / dXi dX2... dXq Pq{Xi, X2, . . . , Xg) . (25) 



The only difference between these different correlation measurements lies in the different 
choice of integration domain Q. 

To explore the utility of unbiased estimators, let us look at the so-called star integral, a 
particular method for measuring multiparticle correlations 0]. The domain Q for the star 
integral is given by the sum of all spheres of radius e centered around each of the A^ particles 
in the event.0 The number of particles ("sphere count") within each of these spheres is, not 
counting the particle at the center Xj^, 

N 

n(X,„ e) = E 0(e - l^n " X,,\) , t^ ^ H , (26) 

12 = 1 

and the factorial moment of order q is 

er(e) = (E^(^n,e)l«^) . (27) 

This can be derived rigorously [Q from Eq. (|^) using for ^2 the equivalent definition 

C(e) = / P^(^i' • • • ' ^«) ©12013 ...Qi.dx,... dx, , (28) 

with the theta functions Gi^ = G(e — \xi — Xj\) restricting all g— 1 coordinates Xj to within 
a distance e of a^i. 

For various reasons, it has become customary in high energy physics to measure normal- 
ized factorial moments [0]. Dividing by the integral of the uncorrelated background p{ over 
the same domain, the normalized star integral factorial moment is 

™tar/ N ^ C' ^ / Pg(a;i, ...,Xq) 9l29l3 ...QlqdXi... dXg 

' ^^"er" IPiixi)...p,ix^)Qu<di3---Qi,dx,...dx,' ^ ^ 

where the denominator Q°'^'^ is given by the double event average 

erne) = (E(E0(^-^nl))' ) ^(T.{Mxi,e)y'') , (30) 

\ il \ «2 / I g \ ii I g 

with Xf^^^ = |X? — X\^\ measuring the distance between two particles taken from different 
events a and h. The (full A'^ev) outer event average and sum over ii are taken over the center 
particle taken from event a, each of which is used as the center of sphere counts nb(X" , e) 
taken over all events h in the (reduced) inner event average. 

Having defined our terms, let us now analyse them from the point of view of estimators. 
Because pq is an unbiased estimator for the true Pg, the numerator ^f"^^ is also unbiased and 



^ When a particle is closer than e to the overall domain boundaries, the sphere around it is 
truncated by the latter, so that this definition is rigorous only for an infinite domain. Boundary 
effects are, of course, the scourge of many correlation measurements, even in astronomy [|l^]. Eq. 
(|28| ) is rigorous for all domain sizes. 

8 



does not need correction. The denominator Q""^™, however, is an integral over the biased 
estimator pi{xi) ■ ■ ■ pi{xg). To shorten notation, we abbreviate the sphere counts introduced 
previously by 

a^^e(e-X-) = n(X^e), j^z (31) 

j 

5^^e(6-Xj)=n,(X^e), (32) 

j 

so that the uncorrected normalization is Q""^™ = {J2i{b)'^^^) g for short. The term inside the 
outer event average we write as ^g = {b)'^~^, a (g— l)-fold product. Inserting these fe's into 



Eqs. (p!9D, (pl|) and (p^, unbiased estimators for the normalization moments are found to 

beg 



ei^rn - 


-{b), 


eiirn - 


-Ah? 


e(er-) = 


-{hf 


eiirn - 


-{b? 



i^2{b,b) 
(A-1)' 

3{b)K2ib,b) 2Ks{b,b,b) 

(A-l) (A-1)[21 ' 

6{b)^K2{b,b) 8{b)K3{b,b,b)+34{b,b) 

(A-l) ^ (A-l)[2] 

6K4{b,b,b,b) +9Kl{b,b) 

(A-l) [3] ' 



(33) 
(34) 

(35) 



(36) 



where the definitions of Kg are given in Eqs. (pOf), (|2^ ) and (^). In other words, the 
naive normalization {b)'^~^ is corrected by correlations of order q—1 and lower, suppressed 
by powers of A. The sample-averaged unbiased estimator for the normalization is then 
g^^norm^ _ (J^i g{Q°^^)) s, and the bias-corrected normalized star integral is the ratio 

fstar 

e(F„) = ^i=- . (37) 



A further possible bias must be tested, and, if necessary, corrected for. Both numerator 
«tar ^^^ normalization use the same sample, and thus will also contain a residual corre- 
lation by referring to the same event during their respective averages. The most obvious 
(but probably not the most elegant) way to remove this correlation is to demand that the 
denominator explicitly exclude each event a currently under consideration in the numerator. 
The bottom-line unbiased estimator for the normalized moment is therefore 

1 A^ev fa 

-iF,)^j^^f:- (38) 

^^Gv a=l J^q 



3 To avoid unnecessarily complicated notation, we omit here and below the bar over "hatted" 
quantities inside the brackets. 

9 



where D^ must now be found from a product of q single-particle densities restricted addi- 
tionally by the condition that all sums must exclude event a. 

Consigning the details to the appendix, we here merely state the results. Defining the 
"correction function" g"^ implicitly by 



D: = e(e; 



norm ) 

1 ' 



9a 



N,. 



(39) 



the corrected normalized moment can be written as a geometric series 



iVc- 



<F,) = TrT. 



N^ 



ev a=l 



fa 
So 



e(e 



oo 

■E 

p=0 



,^ev-l 



(40) 



Suppressed by powers of (A^ev— 1), this series converges rapidly except for very small val- 
ues of A^ev This means that the correction due to correlation between numerator and 
denominator can probably be neglected and only the p=0 term corresponding to Eq. ( |57[ ) 
need be kept. Should doubt arise as to the importance of this correlation, the p=l term 
(^q^g)s/(-^ev— 1) e(^^™™) should bc cvaluatcd for the sample in question and compared to 
the lowest-order term. 

Cumulants are combinations of correlation functions constructed in such a way as to 
become zero whenever any one or more of the points x becomes statistically independent 
of the others. This is done so as to strip away the combinatorial background from the 
correlation measurements, 



C2(a3i, X2) = P2iXi, X2) - PliXi)pi(x2) , 
Cs{Xi, X2, X-i) = psiXi, X2, X-i) - pi{Xi)p2iX2, X3) 

- pi{X2)p2iXs,Xi) - pi{x^)p2{xi,X2] 
+ 2pi{Xi)pi{x2)pl{X'i), 



(41) 



(42) 



etc. Using combinations of conventional moments, they have been measured for various 
experiments |T^,n,T^. Integrating to get the unnormalized star integral cumulants fq, 
defined by 



/g(e) = / Cq{Xi, ...,Xq) 012613 ...QiqdXi... dXq , 



we obtained previously p, with fq = {J2i fq{.'i))t 



f2 = a- {b) , 

/3 = a[2]- (6^1) -2a(6)+ 2(6)2 



(43) 



(44) 
(45) 



and so on for higher orders (see below). The second order /2 has only a single event average 
and so is unbiased, e(/2) = f2- Correcting according to Section |T| the last term for /s which 
involves a double event average, the unbiased version becomes (again omitting the bars) 



</3) = a[2]-(6[2])-2a(6)+2(6)^ 



A 



-K2ib,b); 



(46) 



10 



similarly, the unbiased estimators for /4 and /s are found to be 

e(/4) = a[3l - {b^^^) - 3a[2l(6) - 3a{b^^^) + 6{b){b^^^) + Qa{bf - Q{bf 



+ 



6 



(3(6)-a)fi:2(6,6)-ft:2(6,6[2]) 



A-l 
e(/5) = aW - (foW) - 4a[3l(6) - 4a(6[=^]) 



12 



(A-l) [2] 



K3{b,b,b), 



(47) 



- 6a[2](6[2]) + 8(6)(6[=^]) + 12aPl(6)2 + 6(6P1)(6[2]) 
+ 24a(6)(6l2]) _ 36(6)'(6['l) - 2Aa{bf + 24(6)^ 

^ ' ,(^&)(6a'''-18(&f'])-36a(6) + 72(6)^ 



A 



/t2( 



+ 



24 



(y4-l)[2] 
72 



(A-l) [3] 
The normalized cumulants are estimated by 

e(/,) 



+4/€2(6, 6^'^) + 3fi:2(&f'U''') + (12a - 36(6)) /€2(6, &''') 
3^2(6, 6) + (8(6) - 2a)/t3(6, 6, 6) - 3^3(6, b, b^^^) 

2^4(6,6,6,6) +3/t^(6,6) 



e{Kf-{e)) ^ 



e(e, 



norm] 
9 ) 



e(e 



norin\ 



(48) 



(49) 



and must therefore be corrected for bias in both numerator and denominator. For cumulants, 
too, the residual correlation between numerator and denominator can be tested and corrected 
for; as for the moments, we expect this correction to be negligible. See the appendix for 
details. 

A second useful form for star moments and cumulants are the so-called differential mo- 
ments: Here, one defines not only a maximum distance et but a minimum also, et-i {t can 
define a sequence of such distances). For a given combination of q—1 particles around a 
center particle at Xj^ , at least one of these must lie inside the spherical shell bounded by 
radii et-i and et, while the others are restricted only by the maximum distance e^. This 
definition leads rigorously to the simple and efficient prescriptions for measurement of 
the normalized differential moments and cumulants 



AF,{t) 



AK,{t) 



J2i o^t 



[g-i] 



Jg-i] 






-/g(^,et-i) 



E. ihr' - {bt-i) 



q-l 



(50) 



(51) 



using the shorthand at = h{Xi^, et) and 6j = hb{Xi-^, et). Unbiased estimators are found by 
correcting individual terms as set out for moments and cumulants above. 

V. AN EXAMPLE: THE SPLIT TRACK MODEL 

To illustrate the effect of bias and the use of the reduced inner event averages for the star 
integral, we make use of a simple but effective model, invented previously [§] to simulate 
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the effects of spurious correlations introduced by Dalitz decays, gamma conversion and the 
mismatching of tracks by detectors 0. 

For each "event" , the split track model generates P "points" distributed uniformly inside 
a one- dimensional window, with P itself following a poisson distribution. Each of these P 
points is then either with probability g split up into k "particles", all situated at exactly 
the same position, or with probability (l—g) becomes a single "particle". The average 
multiphcity is thus (A^) = {l—g){P) + gk{P). Clearly, the k particles in a cluster are 
maximally correlated, since they always fall within the same sphere, no matter how small 
the radius e. 

This simple model can be solved analytically and is known to yield scaling cumulants 
Kq for q<k, while cumulants of order greater than k are zero exactly |^. 

We created A^ev = 10, 000 events with average total number of points 20 and setting 
^f = 0.1 and A; = 3. This translates to an average total multiplicity {N) = 24. Doing the 
reduced event averages for the inner (6-)event average, only A = 11 events rather than the 
full A^ev were used. This means a savings of CPU time of about a factor 1000 compared 
to full event mixing. Since there are only three particles per cluster, the true cumulants of 
fourth and fifth order are zero exactly. Both second and third order cumulants should be 
nonzero and scaling. 

In Figure 1, we show the effect of bias corrections on the factorial moments Fg and 
cumulants Kg (note the different ^/-scales, both linear!). F2 and K2 have no bias corrections; 
for the higher orders, the difference grows with increasing order q and smaller sphere radius 
e. As expected, the unbiased i^4 and K^ are zero to within statistical errors, while the biased 
K4 and K5 rise strongly. The rise is due entirely to the equal-event bias which is the subject 
of this paper. F4 and F5 contain contributions from second- and third-order correlations 
| I^ and therefore are not zero. 

Note also that the biased estimate lies below the unbiased one for the moments, while for 
the cumulants, it lies above the unbiased estimator. The reason is that the Fg are corrected 
only through the normalization Q""^™, which in Eqs. ([331)^^. are all seen to be corrected 
downwards; the numerator C,f^^ is unbiased. For Kg, on the other hand, both the numerator 
and denominator require bias corrections. 

The corresponding differential moments and cumulants are shown in Figure 2. The most 
important feature is that only the data point corresponding to the smallest e contains the 
split track contributions to AK2 and AK3. This must be so because all three particles 
belonging to a given cluster are by construction separated by zero distance. 

Secondly, the difference between biased and unbiased differentials is much smaller than 
for the corresponding integral quantities Fg and Kg. This is because in Eqs. (^)-(^Tp 
the subtraction of terms {{bt)'^~^ — {bt-iY^^ etc) means that corresponding corrections also 
largely cancel. The only exception is the smallest-e bin where there are no terms fg{i,et-i) 
and bt-i to subtract, so that the bias corrections for this data point remain uncancelled. 

It may be tempting to use a large value A for inner event averages while neglecting bias 
corrections rather than implementing them. That this is usually not helpful is shown by 
Figure 3, where we have plotted the dependence of the (biased, uncorrected) K5 on the 
number of events A taken for the inner event average. Again, the "true" value is i^s = 0. 
Clearly, the resulting curves converge rather slowly to zero even for large A. The unbiased 
K^, however, are virtually indistinguishable for all values of A shown here, meaning that, for 
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the present parameters, even the smallest value A = 11 is sufficient to obtain good results 
if the bias corrections are implemented. The factor 10 in CPU time needed for the A = 101 
case shown is thus largely wasted. The only remaining advantage of using a larger A is that 
statistical errors become smaller (but the mean value remains the same). 

The curves shown here are for the one-dimensional model; for higher dimensions, the 
effect of split tracks on the correlation is much larger since the rise of the cumulants goes 
roughly like e"*^, where d is the dimension of the phase space. 

At this point we also comment on the use of different random number generators. As 
can be seen in Figures 1-3, the fifth order cumulant K^ and differential AK^ show some 
deviation from the theoretical value of zero for small e. We have tested various available 
random number generators with the split track model, using exactly the same parameters 
quoted above. It turns out that the different generators produce substantially different 
results for Kr, at small e, with some deviating above zero, others below, with varying sizes of 
error bars. The calculation of cumulants in the split track model is clearly a very sensitive 
test of the quality of a random number generator, just as it has proven itself in ferreting 
out statistical and systematic experimental biases. A really good random number generator 
should yield results for Kr, within the split track model which are compatible with zero.^ 

We therefore recommend that, before any experimental measurements of correlations are 
attempted and compared to so-called "random" number data, all random number generators 
first be tested whether they produce truly zero cumulants of higher orders. Only when they 
do can any further conclusions as to correlations in the data be drawn. 

VI. VERY SMALL SAMPLES 

When the number of events in the experimental sample becomes very small, of the 
order of 100 or less, full event mixing may become unavoidable. In this case, of course, 
it becomes mandatory to avoid the equal-event bias, otherwise the measurement is simply 
wrong. Because for small samples CPU time is not an issue, the best and most transparent 
method is directly to implement the full unequal-event estimator of Eq. ([TID for all products 
in cumulants and normalization. 

If for higher q it does become advantageous to avoid direct implementation of unequal- 
event algorithms, our procedures can be used in modified form as follows. 

Whereas the above bias corrections assumed that events b,c, . . . were always unequal to 
the "outer" event a, full event mixing must allow and correct for all possible combinations of 
equal and unequal events. Therefore, the simple procedure of using the expansions of UV . . . 
in terms (UV) etc. of Section |ITT| cannot be applied directly; rather, one must start from 
ffist principles and apply the sum combinatorics to all sums. For example, the unbiased 
estimator for the second order normalization becomes, after rearrangement 

jVcv aj^b i,j 



For the present examples, the generator RAN4 from Ref. U& was used 
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T^EE0(^-^?) 



iV2 

ev a,b i,j 



-iE E0(^-^S")-]f EE0(^-^l 

iVev a \ij ^^ev ^ jj 

EW.) -]^r^(E[« + i-W^]) ' (52) 



so that one can infer 



e(er-) = (6)^ - -^[a + 1 - (6).] . (53) 

The extra "1" stems from the fact that the i,j sums are not restricted to unequal particles, 
so that the count always includes the center particle also. Unlike the reduced event mixing 
case of Eq. (^), which run only over the A events following a, the event averages here are 
performed over all A^ev events, including the a = b case. 

Using similar first-principle combinatorics, we find for the full event mixing cumulant 

eih) = a - {b)s + -TT^ia + 1 - {b)s] ■ (54) 

Higher order normalizations and cumulants are derived analogously. 



VII. CORRECTIONS FOR BOSE-EINSTEIN AND OTHER CORRELATIONS 

The prescription that only unequal events be used is of course true for any kind of corre- 
lation measurement. In the case of Bose- Einstein correlations, most experimental measure- 
ments to date are for second order only, where the double event average in the normalization 
is found through fake event mixing. Very few higher order measurements exist, and these 
are in the form of moments rather than cumulants, so that the problem did not arise either 

Recently, we have derived formulae for the direct measurement of cumulants in Bose- 
Einstein correlations [Q. The particular definition used for the g-particle relative four- 
momentum, 

Q'^ J2 -iPa-P^)\ (55) 

a</3=l 

while convenient because it is directly related to the g-particle invariant mass M^ = (pi + 
. . . +pq)'^, does not allow for a factorization of the multiple sums as was the case for the star 
integral. For this reason, there is little sense in deriving corresponding correction formulas; 
rather, one simply must enforce all event sums to refer to unequal events as in Eq. (|1^) and 
do the full g-times event average (or the corresponding reduced version). 

There is one choice of the g-particle four-momentum that does allow for factorization of 
the sums as in the star integral, namely 
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Q2^^-(pi-p,)2. (56) 



j=2 



For this case, corresponding correction formulae can be derived and the savings in CPU time 
achieved. It is unclear, however, whether such choice of variable is preferable to the original 
choice of Eq. (|55D for reasons other than convenience. 

The situation is quite different for the traditional (bin-based) factorial moments Fq = 



(1/M) X]m(^m)/(^m)'' of Bialas and Peschanski |l[] and their cumulants |jT2|. Here, the 



multiple event averages must be corrected in the same way as the star integral; for example 

and so on for higher order normalizations and cumulants. The inherent instability and large 
error bars found for these moments and cumulants, however, make it doubtful that these 
corrections will make a discernible difference. 



VIII. CONCLUSIONS 

The statistical bias arising through the need for multiple event averages must be un- 
derstood and corrected for. We have shown how the theory of unbiased estimators leads 
to correction formulas for the star integral, thereby making it possible to run it under fast 
algorithms without loss in accuracy. For the envisaged large data samples, this savings 
in CPU time may prove the difference between viability and impossibility of correlation 
measurements in future. 

For truly small samples, the correction for this bias is not a tool for faster analysis 
but constitutive for a correct measurement. Typical small samples are found in cosmic ray 
data and in galaxy correlations as well as the subdivision of inclusive data samples into 
fixed-multiplicity subsamples. All these must take cognizance of the bias and correct for it. 

This brings us to the subject of single-event measurements: event mixing is, of course, 
not possible when there is just one available. For the proposed measurement of Bose- Einstein 
correlations in single events in nuclear collisions at RHIC and LHC, the solution is clearly to 
normalize by event mixing based on a sample of similar events. Most notably, this mixing 
sample should have the same multiplicity and general characteristics; such requirements will 
necessarily restrict the sample to relatively few events, so that the bias corrections may 
become important. 

Galaxy distributions, on the other hand, present a much more difficult task: there is 
no pool of big bang events to make up the uncorrelated background. So far, the preferred 
solution was to assume a uniform distribution on a sufficiently large scale. Recent results 
on the large-scale structure of the universe, however, make this assumption increasingly 
untenable. The only alternative route would appear to be to select a number of windows in 
the sky (with about the same overall galaxy count as the window used for the numerator) 
and, neglecting the long-range correlations, count these as different "events". In this way, 
no assumption of overall uniformity need be made. 
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APPENDIX: UNBIASED ESTIMATORS FOR NORMALIZED MOMENTS 

In this appendix, we derive the correction functions cjq to be used for checking for residual 
correlations between numerator and denominator of the normalized factorial moment and 
cumulant. Our notation will be as follows: we use roman letters a,b, . . . for the event indices 
of the numerator of the normalized moments, and greek letters a, f3, . . . for the denominator. 

1. Reduced event mixing 

The numerator of the biased uncorrected F2 is given by 

^2 = ^Y.ih (Ai) 

^^ev a=l 

i2=T.^ie-X^;), (A2) 



while the denominator is 



Ncv a-1 



e(e?°™) = T^^E E T^P^ (A3) 

JVev^ a=ip=a-A 



with 



Ta, = Y.^{e-Xtf). (A4) 



Note that the inner /3-average (Tq,^) is equal to I]j(&) in the shortened notation of Eq. 

In order to get an unbiased estimator 6(^2) of the normalized second order factorial 
moment, we exclude explicitly from the denominator the event a used in the numerator 
event sum: 

«(A) = ^Ef (A5) 

^^ev a=l J^2 

where the denominator is now a-dependent: 
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7-)a 



1 



iVe- 



1 a— 1 



1 



iVc- 



rnS'i 



and the condition 



-5„ 



a-l 



XI (1 ~ Caa)Taf3 + ^ ^ (1 - 5/3a)C'aa T, 



P=a-A 



a+A 
u=a+l 



A 



aP 



P=a-A-l 



(A6) 



(AT) 



is unity whenever a is in the range [a+1, . . . , a+A\ and zero otherwise (note that Caa^aa = 0). 
The reason for the sphtting of the /?-suni is that whenever a is in this range, the index f3 
must "jump" the a-event, meaning that the count must start at a—A—1. The form (|A6|) 
thus exphcitly excludes the currently-used numerator event a. 

To find the relation between 6(^2) and F2, we factor out of D^ the usual normalization 
and write the remainder in terms of a function ^2 which is to be determined: 



^2 = ei^^^ 



1- 



92 



N^., - 1 



(A8) 



The moment estimator is then a geometric series 



A^ov 



<F2) = ^ E 



K 



ev a=l 



^2- 



■E 

p=0 



^2" 



e(e2"--)„tr,VA'ev 



+ 



^2^^! 



+ 



CAO) 

which usually converges rapidly. The correction function ^2 is found as follows. The quantity 
in the square brackets of Eq. ([A^) yields, on rearrangement. 



a-l 



C 1 

/3=a— A 



SO that 



^2-e(e 



norin'i 



iVcv 1 



t-^aal^-' a. 



a— A 



-1 — Ta 



a-l 



N^., - 1 



+ E 

f3=a-A 



[^ — Saa)Tal3 T, 



aP 



N^., - 1 



N,, 



(AlO) 



(All) 



After changing to index u = P + A + 1, the 6aa term becomes Y^ap ^aaTap = Z]u=f+i ^a,«-A-i, 
which yields, using Eqs. (^) and (|A7|) , 



^2 - e(e?^ 



A/",. 



a+A 



e(e: 



normal 



A 



/ , [''■a,u—A—l -'m,u-A-1 + -'a,Mj 



u=a+l 



SO that we can identify 

92 = -1 + 



a+A 



ei^r^A ^^^, 



/ , (-tajU— A— 1 -tUj-u-A-l + -tajuj 



(A12) 



(A13) 



Implementing this type of correction thus involves keeping the sphere counts of events mixed 
within a range [a — A, . . . , a + A\. 
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2. Full event mixing 

Correcting for bias in the case of full event mixing is somewhat easier than for the 
reduced event mixing above because the /5-sum does not have to be spht up. The equivalent 
definitions are 

and 

1 Nov 

^2 = f.r _iVAr -9^ ^ (^ - ^-)(^ - ^«'3)(1 - ^c.p)TaP , (A15) 

{^^cv J-j(^iVcv ^j a,/3=l 

giving, using the symmetry of T^^, 



iVev-2 

2 

iV.. - 2 



cv a/3 ^'cv -■- ^ 

e{W^)-J2{b)X (A16) 



the second term being an event mixing average performed around tracks i of (numerator) 
event a only, and hence 

.a ^ 2 (^M^ _ i] . (A17) 

y2 \^e(G°™) / 

The difference between this and the reduced event mixing case is that the former keeps 
the "mixing tail" to strictly A events, so that even for the maximal A = Ncv—'2 it always 
leaves out one event in the mixing. The full event mixing outlined above, on the other 
hand, changes from mixing with Ne^—2 events in D2 to iVey— 1 in 6(^2°"^™). The power series 
expansion now reads 

^ 2J - e(^p5r) (jY^^ _ 2) e(G^^) ' " 



3. Corrections for higher order 

For higher orders, a similar prescription would be followed in eliminating bias arising 
from numerator-denominator correlations. The unbiased form is 



With the understanding that all indices ctj are kept strictly unequal to each other throughout, 



A^c. 



Dl 



(iVev- 






Qi —1 



ai —1 



Al?- 



_1] 2-^ V tv(jai j Jai,...,, 



QT,...,ao 






/ , *-^aai -^ c 



aai -^ ai,...,a5 



= ai -A-1 



(A20) 



where 



T, 



ai,02,---,09 



ii,i2,..;iq 



(A21) 



with 0"/j2^ = 9(e — Xj"^"^) as usual. Note that T is symmetric in all indices except ai. 
The idea is then to expand Z)" in terms of the corresponding a-independent normalization 
g(^Qorm'j g^j^^ g^ correction function g^. The unbiased normalized moment is then given by an 
expansion of the form of Eq. ( |A9| ) in powers of iVev— 1. 

By excluding one event from the sum, we are explicitly breaking the symmetry of the 
sphere counts that permitted factorization of the multiple sums, so that the correction 
function g'^ becomes rapidly more complex with q. Here, we merely outline the results for 
third order. We have from Eq. ( |A20| ) , after going through similar steps as for q = 2, 



Dl - e(e: 



norni\ 



iVev-1 



n-1 



<i. 



norm'i 



A\2\ Z^ \^ Z^ [Tu,u- 



A 



■A-1,/3 



Tu,a,l3) 



u=a+l \ f3=u~A 



a-1 

'2Tu^a,a ~ 2^ (1 ~ ^f3,u-A-l)Ta,u-A-l,(3 
f3=a-A 



(A22) 



Note that e(^3°''™) itself is an unbiased estimator obtained from the biased form through 
Eq. (P5). The correction function is then 



^ a+A 



A\:- 



u=a+l 



M-1 



2 2^ (^u,a,/3 — Tu,u-A-l,l3) 
l3=u~A 



a-1 



~ 2{Tu^a,a ~ Tu^a,u-A-l) + Z^ [^ ~ ^l3,u-A-l)Ta,u-A-l,l3 

f3=a-A 



(A23) 



Because the correction g^ applies to the denominator only, cumulants of higher order are 
immediately found from 



\ 1J AT ^-^ p)a 

^"ev a=l ^q 



(E,e(/5)). 



+ 



{r^Hieit)). 



rnorin \ 



e(er™) ■ (A^cv-1) e(e 
For the case of full event mixing, we obtain for the moments 



+ 



(A24) 



<F,) 



(e). 



e(C 



norm 1 
9 '' 



+ 



{i^g:) 



{N,^ - q) e(e---) 
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+ 



(A25) 



where for third order 



I A^c 



^3° = -3+ .^ _.w^ _^. E {Taa^ + 2T^a^) . (A26) 
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APPENDIX: LIST OF FIGURES 



Figure 1: Normalized factorial moments Fg and cumulants Kg for g = 2, . . . , 5 for the split 
track model, with 10% of the points split up into 3 tracks. For the inner event average 
to calculate the n sphere counts, only A = 11 events were used rather than the full event 
mixing of A'^ev = 10,000 events (i.e. shortening the CPU time by a factor ~ 1000). The 
biased moments and cumulants are clearly wrong {K4 and K^ should be zero), while the 
unbiased version are fine. 



Figure 2: Differential moments and cumulants. As the three split tracks are all at the 
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same point, the correlation due to their presence is always contained in the smallest bin; 
this is clearly visible as the single point in the unbiased Ai^2 and Ai^3. 



Figure 3: Full event mixing (using all A'^ev events for the inner event averages) is not a 
useful alternative to bias corrections. As shown here, one needs upwards oi A = 101 events 
in the inner loop to make the biased estimate approach that of the true value K^ = 0; the 
unbiased estimators (filled circles) of K^, on the other hand, all lie close to zero even for 
A = 11 so that this small number is sufficient for a good estimate. CPU time is roughly 
proportional to A, i.e. a factor 9 larger for A = 101 than for A = 11. 
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